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ABSTRACT 


Hands-on  performance  tests  are  the 
benchmark  against  which  other  measures 
of  proficiency  should  be  compared. 
However,  hands-on  performance  tests  are 
expensive,  time-consuming,  and  sometimes 
dangerous  to  personnel  or  equipment. 

This  paper  analyzes  the  relationship 
between  hands-on  performance  tests  and 
job-knowledge  tests  to  propose  better 
methods  for  developing  job-knowledge 
tests  and  to  determine  when  job- 
knowledge  tests  could  best  be  used  in 
place  of  hands-on  performance  tests. 


EXECUTIVE  SUMMARY 


The  Marine  Corps  must  periodically  assess  the  ability  of  its 
personnel  to  perform  mission-related  tasks.  The  best  criterion  of 
mission-related  proficiency  would  be  a  hands-on  performance  test  (HOPT) , 
because  of  its  validity  and  objectivity.  HOPTs  have  been  established  as 
the  benchmarks  against  which  other  measures  should  be  compared. 

However,  HOPTs  are  expensive,  time-consuming,  and  sometimes  dangerous  to 
personnel  or  equipment. 

In  certain  circumstances,  job-knowledge  tests  (JKTs)  can  be 
inexpensive  substitutes  (i.e.,  surrogates)  for  hands-on  performance 
tests.  Previous  research  shows  that  JKTs  show  promise  as  surrogates  for 
diagnosis  of  some  training  needs,  but  these  studies  do  not  fully  analyze 
the  strengths  and  weaknesses  of  job-knowledge  tests. 

The  current  study  extends  previous  studies  to  indicate  the 
conditions  under  which  job -knowledge  tests  should  be  used  as  surrogates 
and  to  provide  guidance  for  developing  better  job-knowledge  tests. 

Both  HOPTs  and  JKTs  measure  job  proficiency  with  some  degree  of 
error.  Ideally,  scores  on  the  HOPTs  and  JKTs  should  be  the  same;  that 
is,  JKT  items  should  have  the  same  difficulty  level  as  HOPT  tasks.  Such 
equality  should  be  present  for  the  total  test  score  and  extend  to  scores 
at  the  duty- area  level.  Duty- area  scores  provide  informative  compar¬ 
isons  that  may  be  useful  for  examining  training  needs . 

Discrepancies  between  HOPTs  and  JKTs  should  be  minimized,  because 
discrepancies  can  lead  to  misinterpretation  of  training  needs.  This 
research  memorandum  uses  "sample-difficulty"  analyses  to  quantify  causes 
of  HOPT-JKT  discrepancies.  These  analyses  postulate  HOPT-JKT  discrep¬ 
ancies  to  be  the  result  of  the  following  two  factors,  plus  error: 


Dopt  -  jkt  -  Sampling  +  Difficulty  +  aerror 

On  the  right  side  of  the  equation,  the  first  term  indicates  discrep¬ 
ancies  that  result  from  item  sampling.  These  occur  when  the  proportion 
of  items  differs  from  the  proportion  of  tasks  within  a  duty  area.  For 
example,  sampling  discrepancies  arise  when  there  are  no  JKT  items  to 
represent  a  HOPT  task,  or  when  there  is  an  overabundance  of  items 
representing  a  single  task. 

The  second  term  refers  to  differences  in  the  difficulties  of  JKT 
items  compared  to  HOPT  tasks.  "Difficulty"  is  simply  the  average 
proportion  of  HOPT  steps  performed  correctly  or  the  average  fraction  of 
examinees  who  respond  correctly  to  JKT  items.  The  difficulty  component 
compares  a  task  with  the  items  written  to  represent  that  task.  The 
"error"  term  represents  all  other  differences  that  cannot  be  explained 
by  the  two  other  components.  Sampling  discrepancies  could  be  corrected 
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by  allocating  items  to  be  proportional  to  the  number  of  HOPT  tasks; 
difficulty  discrepancies  could,  in  many  cases,  be  corrected  by  better 
item  writing.  In  some  cases,  difficulty  discrepancies  are  an 
unavoidable  result  of  differences  in  the  skills  required  by  hands-on  as 
opposed  to  paper-and-pencil  tests. 

This  study  analyzed  data  from  more  than  1,900  first- term  Marine 
infantrymen.  Sample-difficulty  (S-D)  analyses  found  that  the  most 
common  and  significant  reasons  for  HOPT-JKT  discrepancies  were 
differences  in  difficulty.  Further  analyses  were  therefore  focused  on 
the  quality  of  item  writing. 

Analyses  of  JKT  item- total  correlations  determined  that  about  19  of 
the  150  core  job-knowledge  items  lacked  proper  measurement  properties. 
Deletion  of  these  items  improved  overall  HOPT-JKT  correspondence 
approximately  15-20  percent. 

It  often  occurred  that  job -knowledge  items  were  more  difficult  than 
the  actual  hands-on  task.  This  occurred  when  items  asked  for  knowledge 
that  was  not  required  to  perform  the  task,  or  items  were  geared  for 
leadership  functions  not  yet  encountered  by  first- term  Marines.  Simpler 
item  formats  and  better  task  analyses  are  recommended  to  avoid  these 
problems  with  job-knowledge  tests. 

Conversely,  it  was  found  that  traditional  multiple-choice  job- 
knowledge  items  could  be  easier  than  the  corresponding  tasks  if  the 
items  reduce  complex  activities  to  a  simple  choice  between  alterna¬ 
tives.  For  example,  abilities  to  perform  complicated  activities  such  as 
building  field-expedient  antennas  are  probably  better  measured  using 
formats  other  than  traditional  multiple  choice. 

In  summary,  the  following  steps  should  be  taken  to  improve  job- 
knowledge  tests : 

•  The  proportion  of  items  should  more  nearly  reflect  the 
percentage  of  tasks  in  the  duty  area. 

•  A  larger  number  of  job-knowledge  items  should  be  written. 

•  Items  with  low  item- total  correlations  should  be  revised 
or  deleted  after  thorough  pilot  testing. 

•  Task  analyses  should  determine  the  appropriate  difficulty 
level  of  questions  and  determine  whether  alternative 
formats  are  necessary  to  assess  each  HOPT  skill. 

•  Task  analyses  should  determine  what  steps  or  aspects  of  a 
duty  area  should  be  assessed  by  means  of  a  job-knowledge 
test.  Critical,  knowledge -dependent  steps  should  be 
emphasized  on  a  job -knowledge  test. 
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Analyses  in  this  paper  used  the  HOPT  as  the  benchmark  against  which 
the  JKT  should  be  compared- -hence ,  error  associated  with  the  HOPT  is 
ignored.  Job-knowledge  tests  might  be  a  better  benchmark  if  the  objec¬ 
tive  of  testing  is  to  determine  Marines'  knowledge  as  opposed  to  "can 
do"  abilities.  Nevertheless,  taken  as  a  whole,  this  and  previous 
research  indicate  that  JKTs  are  most  appropriate  for  measuring 
proficiency  in  the  following  tasks: 

•  Knowledge-driven  tasks  that  require  memory  for  specific 
facts  and  attention  to  detail  among  complex  alterna¬ 
tives.  Some  skills  in  land  navigation  are  knowledge - 
driven. 

•  Reading-dependent  tasks  in  which  the  JKT  is  a  close 
approximation  to  actual  job  requirements. 

•  Time- independent  tasks  in  which  the  actual  job  perform¬ 
ance  allows  sufficient  time  to  recall  information.  Some 
maintenance  tasks  are  quite  time -independent. 

In  contrast,  physical-coordination  tasks,  very  difficult  tasks,  and 
time -critical  tasks  are  usually  inappropriate  to  measure  with  a 
multiple -choice  JKT.  Extremely  difficult  tasks  should  not  be  measured 
by  means  of  multiple  choice,  because  guessing  provides  a  lower  bound  on 
measured  proficiency  levels.  The  paper  indicates  that  physical- 
coordination  and  time -critical  tasks  should  be  measured  by  HOPTs  or 
high-fidelity  simulations.  Some  aspects  of  complex  construction  tasks 
can  be  measured  using  alternative  item  formats.  However,  the  critical 
concern  is  that  the  measurement  be  an  accurate  reflection  of  an 
individual's  proficiency  level  and  not  be  contaminated  by  reading  or 
writing  abilities  that  can  highly  influence  performance  on  job-knowledge 
tests . 
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INTRODUCTION 


Periodically,  the  Marine  Corps  must  assess  the  ability  of  each  of 
its  personnel  to  perform  mission-related  tasks.  This  research 
memorandum  analyzes  how  Marines'  proficiencies  could  be  measured  more 
accurately.  It  compares  the  results  of  hands-on  performance  tests 
(HOPTs)  and  job-knowledge  tests  (JKTs),  then  recommends  ways  in  which 
better  job-knowledge  tests  could  be  developed.  This  research  also 
studies  the  circumstances  in  which  a  JKT  is  most  appropriate. 

The  best  criterion  for  measuring  relative  training  success  would  be 
a  HOPT  because  of  its  validity  and  objectivity.  The  National  Academy  of 
Sciences  Committee,  which  provided  scientific  oversight  of  the  Job- 
Performance  Measurement  Project,  has  recommended  use  of  HOPTs  as  the 
benchmarks  against  which  other  measures  should  be  compared  [1], 

However,  HOPTs  are  expensive,  time-consuming,  and  sometimes  dangerous  to 
personnel  or  equipment. 

In  certain  circumstances,  JKTs  can  be  inexpensive  substitutes 
(i.e.,  surrogates)  for  HOPTs  within  a  duty  area.  A  duty  area  is  a 
domain  of  job  performance  defined  by  Individual  Training  Standards,  such 
as  land  navigation  or  tactical  measures.  To  be  used  as  a  surrogate,  a 
JKT  should  provide  the  same  profile  of  duty- area  strengths  as  would  a 
HOPT  [2,3],  In  other  words,  the  difficulty  of  each  duty  area  should  be 
the  same  for  JKTs  and  HOPTs.  As  indicated  in  the  quote  below,  the 
National  Academy  of  Sciences  Committee  has  concluded  that  hands-on 
measures  of  enlisted  performance  are  the  benchmark  against  which 
surrogates  such  as  JKTs  should  be  compared. 

. . .  the  project  consensus  is  that  the  most  direct 
measures  of  job  behaviors  have  the  greatest  likelihood 
of  meeting  validity  requirements.  Although  it  is  not 
universally  applicable,  the  hands-on  job  sample  test 
is  the  measure  of  job  proficiency  with  the  greatest 
fidelity  to  actual  job  performance.  For  each  military 
occupational  specialty  under  study,  therefore,  a 
hands-on  test  will  be  developed.  But  because  such 
tests  are  very  expensive  to  administer  and,  for 
reasons  of  time,  cost,  and  safety,  can  only  sample  a 
small  number  of  tasks  in  a  given  MOS ,  an  important 
objective  of  the  project  is  to  develop  additional 
"surrogate"  measures  for  each  MOS  that  are  cheaper 
and  more  feasible  for  large-scale  administration.  The 
hands-on  measure  will  serve  as  the  benchmark  to  which 
the  surrogate  measures  must  compare  favorably  if  they 
are  to  be  endorsed  by  the  Joint-Service  Project 

(1,  p.  6] . 

Ideally,  there  should  be  correspondence  between  the  proportion 
correct  of  the  HOPT  and  JKT  within  a  duty  area.  If  the  proportion 
correct  deviates  between  HOPT  and  JKT,  a  discrepancy  occurs.  This 
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research  analyzes  the  reasons  for  such  discrepancies  and  proposes 
methods  to  minimize  them.  Previous  research  has  shown  that,  in  the 
infantry  occupational  field,  job-knowledge  tests  can  be  useful  for 
diagnosing  training  needs  in  some,  but  not  all,  duty  areas  [2,  4,  5]. 

The  current  research  uses  sample-difficulty  (SD)  analyses,  modified 
from  Cooke  [6] ,  to  quantify  the  degree  to  which  discrepancies  are  the 
result  of  two  factors,  plus  error  as  follows: 

Dopt  -  jkt  -  Sampling  +  Difficulty  +  aerror 

On  the  right  side  of  the  equation,  the  first  term  indicates 
discrepancies  that  result  from  item  sampling.  These  occur  when  the 
proportion  of  items  differs  from  the  proportion  of  tasks  within  a  duty 
area.  For  example,  sampling  discrepancies  arise  when  there  are  no  JKT 
items  to  represent  a  HOPT  task,  oi  when  there  is  an  overabundance  of 
items  representing  a  single  task. 

The  second  term  refers  to  differences  in  the  difficulties  of  JKT 
items  compared  to  HOPT  tasks.  "Difficulties"  are  simply  the  average 
proportion  of  HOPT  steps  performed  correctly  or  the  fraction  of 
examinees  who  respond  correctly  to  a  JKT  item.  The  difficulty  component 
compares  a  task  with  the  items  written  to  represent  that  task.  The 
"error"  term  represents  all  other  differences  that  cannot  be  explained 
by  the  two  other  components . 

This  research  next  makes  recommendations  about  better  test  develop¬ 
ment  techniques  that  could  minimize  the  observed  sample  and  difficulty 
discrepancies.  Additional  analyses  are  performed  to  determine  specific 
content  areas  that  are  most  appropriately  assessed  by  means  of  JKTs . 

METHOD 

Subjects 

HOPTs  and  JKTs  were  administered  to  more  than  1,900  first- term 
Marines  in  four  infantry  specialties.  Over  1,000  riflemen,  300  machine 
gunners,  300  mortarmen,  and  300  assaultmen  took  part  in  this  research. 
Individuals  to  be  tested  were  randomly  selected  from  the  available 
Marine  Corps  for  each  Military  Occupational  Specialty  (MOS) .  The  sample 
was  stratified  by  pay  grade,  length  of  service,  and  educational  level. 
Two-hundred  of  the  riflemen  were  retested  with  the  alternate  form  of  the 
performance  and  job-knowledge  tests  to  determine  the  test-retest 
reliability  of  the  testing  procedures.  Further  description  of  the 
sample  is  provided  in  reference  [7], 
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Measures  and  Reliability 

Hands-On  Performance  Tests 

The  first  task  in  developing  j ob -performance  measures  was  to  define 
the  requirements  of  Marine  Corps  enlisted  infantrymen  for  each  MOS . 
Individual  Training  Standards  (ITS)  prepared  by  the  Marine  Corps  were 
the  primary  source  of  detailed  information  about  the  tasks  required  in 
each  MOS.  Analyses  of  the  ITS  were  conducted  to  ensure  that  tasks 
selected  for  testing  would  maximize  the  coverage  of  job  behaviors.  In 
this  manner,  hands-on  test  scores  would  generalize  to  the  full  range  of 
infantry  job  requirements.  Appendix  A  and  reference  [8]  provide  further 
details  of  the  test  construction  procedures. 

HOPTs  were  developed  for  the  selected  test  content.  These  tests 
were  reviewed  by  Marine  Corps  job  experts.  They  were  then  trial -tested 
and  improved  before  a  large-scale  tryout  was  conducted  with  more  than 
200  Marines .  Table  1  provides  an  overview  of  the  duty  areas  covered  in 
the  test  of  infantry  skills,  and  further  details  regarding  test 
development  can  be  found  in  appendix  A.  Test- administrator  training  was 
conducted  for  two  weeks ,  during  which  time  test  administrators  learned 
to  perform  all  tasks  and  to  score  performance  according  to  objective 
criteria. 


Table  1.  Examples  of  duty  areas  and  tests  included  in  hands-on 
performance  tests  of  infantry  skills 


Duty  Area  Examples  of  Tests 


Tactical  measures 
Security  and  intelligence 
M16A2  service  rifle 
M203  grenade  launcher 
Hand  grenades 
Mines 

Communication 
Land  navigation 
First  aid 

Nuclear,  biological,  chemical 
Light  antitank  weapon  (LAW) 
Night  vision 

Squad  automatic  weapon  (SAW) 


Call  for/adjust  indirect  fire 
Process  prisoners 
Live  fire  at  pop-up  targets 
Prepare  for  firing 
Throw  dummy  grenades 
Install  Claymore  mines 
Assemble  and  operate  radio 
Determine  location 
Treat  sucking  chest  wound 
Prepare  NBC-1  report 
Prepare  to  fire 
Operations  inspection 
Fieldstrip  SAW 


Internal  consistency  reliabilities  for  the  HOPT  were  fairly  high 
overall  (varying  between  .88  and  .83,  depending  on  the  MOS)  [2,7,9],  but 
somewhat  low  by  duty  area,  as  shown  in  table  2.  These  reliabilities 
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were  high  enough  to  use  the  HOPT  as  a  benchmark  against  which  the  JKT 
could  be  judged,  although  lack  of  reliability  can  cause  HOPT-JKT 
discrepancies . 


Table  2.  Reliability  estimates  of  the  job  knowledge  test  and 
HOPT  by  duty  area 


Duty  area 

Core  JKT 

HOPT 

Form  A 

Form  B 

Form  A 

Form  B 

Land  navigation 

.65 

.65 

.77 

.69 

Security  and  intelligence 

.65 

.63 

.36 

.27 

Communications 

.61 

.56 

.70 

.65 

Grenade  launcher 

.50 

.36 

.31 

.31 

Tactical  measures 

.50 

.68 

.45 

.35 

LAW 

.45 

.46 

.76 

.75 

SAW 

.46 

.61 

.61 

.59 

NBC 

.42 

.37 

.54 

.57 

Mines 

.43 

.35 

.85 

.89 

Hand  grenade 

.15 

.13 

NA 

NA 

Night  vision 

.14 

NA 

.51 

.70 

First  aid 

.03 

.20 

.65 

.58 

NOTE:  Cronbach  internal 

consistency 

reliabilities  for 

areas 

marked  "NA" 

could 

not  be  computed  because  there 

was  only 

one  item  on 

that  scale. 

Job -Know ledge  Tests 

Development  procedures  for  the  JKTs  followed  those  for  the  HOPTs 
and  are  detailed  elsewhere  [2,  appendix  B],  Overall  internal 
consistency  reliabilities  of  the  JKT  were  fairly  high,  varying  from  .90 
to  .87,  depending  on  the  MOS.  Reliabilities  by  duty  area  were  moderate 
to  low  (table  2).  The  low  reliabilities,  which  were  partly  the  result 
of  having  too  few  items  for  each  duty  area,  limited  HOPT-JKT 
correlations  by  duty  area  and  prevented  useful  determination  of  each 
individual's  duty-area  strengths  and  weaknesses  on  the  basis  of  the  JKT 
[2],  Correlations  by  duty  area  ranged  from  a  high  of  .46  for  land 
navigation  to  .04  for  the  night-vision  device  (table  3).  However,  the 
low  reliabilities  do  not  necessarily  limit  the  JKT  for  determining 
overall  proficiencies  [2] . 
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Table  3.  Job-knowledge  test- -hands-on 
correlations  by  duty  area 


Land  navigation 

.46 

NBC 

.28 

Tactical  measures 

.29 

Communications 

.25 

LAW 

.23 

Grenade  launcher 

.22 

First  aid 

.20 

Security  and 

.18 

intelligence 

Mines 

.16 

SAW 

.11 

Hand  grenades 

.05 

Night  vision 

.04 

RESULTS 

Duty-Area  Analyses 

Analyses  of  Form  A  and  Form  B  HOPT-JKT  correspondence  are  presented 
in  figures  1  and  2.  These  figures  show  discrepancies  between  the 
average  percentage  of  HOPT  steps  performed  correctly  and  the  average 
percentage  of  examinees  answering  JKT  items  correctly,  by  duty  area. 
Positive  values  show  that  the  JKT  was  more  difficult*  than  the  HOPT. 
Ideally,  HOPT-JKT  discrepancies  would  be  zero- -corresponding  to  the 
solid  line  shown  in  these  figures.  It  would  also  be  ideal  for 
differences  between  forms- -contrasts  between  figures  1  and  2- -to  be 
negligible . 

Both  figures  show  that  the  job-knowledge  test  was  more  difficult 
than  the  hands-on  performance  test  for  most  duty  areas.  For  Form  A,  the 
job-knowledge  test  was  about  as  difficult  as  the  HOPT  for  hand  grenades, 
land  navigation,  and  SAW;  conversely,  night  vision  and  first-aid  items 
were  much  more  difficult  than  the  corresponding  HOPT  tasks .  Form  B 
showed  mines  as  an  area  where  the  JKT  was  easier  than  the  HOPT;  the  JKT 
was  much  harder  for  the  night-vision  duty  area. 


1.  "Difficulty"  for  the  HOPT  is  the  average  percentage  of  task  steps 
performed  correctly;  for  the  JKT,  "difficulty"  is  the  average  percentage 
of  examinees  responding  correctly  to  the  JKT  items.  When  the  HOPT  is 
easier  than  the  JKT,  the  average  percentage  of  HOPT- task  steps  performed 
correctly  is  higher  than  the  average  percentage  of  examinees  responding 
correctly  to  the  JKT  items,  so  discrepancy  values  will  be  positive. 
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HOPT-JKT 

discrepancy 


Figure  1.  Overall  HOPT-JKT  discrepancies  (Form  A) 


HOPT-JKT 

discrepancy 


Duty  area 


Figure  2.  Overall  HOPT-JKT  discrepancies  (Form  B) 
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Some  of  these  HOPT-JKT  discrepancies  could  be  attributed  to  un¬ 
reliability,  as  shown  in  table  2.  It  is  not  surprising  that  the  night- 
vision  duty  area,  which  had  virtually  no  reliability  or  correlation  with 
the  HOPT,  was  an  area  where  the  JKT  and  the  HOPT  diverged  consider¬ 
ably.  However,  communications,  security  and  intelligence,  and  tactical 
measures  also  had  large  discrepancies  despite  moderate  correlations  with 
HOPTs . 

For  the  three  duty  areas  with  large  form  discrepancies ^  (first  aid, 
mines,  NBC),  such  differences  were  partially  the  result  of  differences 
in  test  content.  For  example,  replacing  and  recovering  the  Claymore 
mine  was  based  on  an  electronic  detonation  for  Form  A  versus  tripwire 
detonation  for  Form  B.  Marines  are  much  less  proficient  in  emplacing 
and  recovering  Claymore  mines  with  tripwires  than  with  electronic 
devices.  All  other  duty  areas  had  similar  test  content,  and  no  other 
significant  form  discrepancies  were  noted. 

Duty-Area  Sample -Difficulty  Analyses 

Two  reasonable  hypotheses  for  explaining  duty-area  HOPT-JKT 
discrepancies  are  that  the  job-knowledge  test  items  did  not  provide 
proportional  coverage  of  duty-area  tasks  ("sampling");  or  the  difficulty 
level  of  the  JKT  items  was  different  from  corresponding  HOPT  tasks 
("difficulty") . 

The  first  source  of  discrepancies,  "sampling,”  is  responsible  for  a 
discrepancy  when  the  proportions  of  items  within  a  duty  area  differ  from 
the  proportions  of  tasks  within  a  duty  area.  For  example,  sampling 
discrepancies  arise  when  there  are  no  JKT  items  to  represent  a  HOPT 
task,  or  when  there  is  an  overabundance  of  items  representing  a  single 
task.  The  second  source,  "difficulty,"  is  responsible  for  a  discrepancy 
when  a  JKT  item  is  either  harder  or  easier  than  the  task  it  is  supposed 
to  represent. 

The  quantification  of  the  relative  importance  of  these  two  sources 
of  discrepancies  was  based  on  "shift-share"  analyses  used  by  Cooke  [6], 
Appendix  C  explains  the  mathematical  background  of  these  analyses. 

To  begin  S-D  analyses,  each  hands-on  performance  task  was  reviewed 
and  matched  to  items  on  the  JKT  to  produce  a  "crosswalk"  between  the 
HOPT  criterion  and  the  JKT  surrogate.  Matches  were  made  by  reviewing 
each  item  and  deciding  which  task  it  represented.  Next,  the  reverse 
procedure  was  employed- -  tasks  were  matched  to  items --to  confirm  task- 
item  pairings.  Appendix  D  shows  the  names  and  average  percentage 

I.  Form  discrepancies  are  differences  in  the  HOPT-JKT  discrepancy 
between  Form  A  and  Form  B.  For  example,  the  HOPT-JKT  discrepancy  for 
mines  was  about  +.06  for  Form  A,  but  -.12  for  Form  B.  These  form 
discrepancies  primarily  reflect  differences  in  test  content,  but  they 
also  reflect  differences  in  the  examinees  who  used  the  forms. 


correct  of  tasks  and  their  matching  items.  Note  that  some  tasks  had  no 
corresponding  items  (e.g.,  convert  azimuth),  and  vice-versa. 


Table  4  shows  the  results  of  S-D  analyses  by  duty  area  and  form. 

The  "sample"  component  represents  the  extent  to  which  differences  in  the 
proportions  of  items  within  a  duty  area  differ  from  the  proportions  of 
tasks  within  a  duty  area.  It  will  be  positive  if  the  easier  HOPT  tasks 
constitute  a  larger  proportion  of  the  HOPT  total  score  than  the 
corresponding  JKT  items.  For  example,  in  Night  Vision  on  Form  A 
(appendix  C) ,  "visual  inspection"  was  the  easiest  task  (67.9  percent 
correct),  and  the  second-easiest  was  "operations  inspection"  (65.3 
percent  correct).  Each  of  these  easier  tasks  accounted  for  33.3  percent 
of  the  total  HOPT  score,  but  the  corresponding  items  contributed  only 
25.0  percent  to  the  total  JKT  score.  The  positive  sign  indicates  that 
this  sampling  tended  to  make  the  total  HOPT  score  higher  than  the  JKT. 

The  "difficulty"  component  represents  the  difference  in  difficulty 
between  the  item  and  the  task  the  item  is  supposed  to  represent. 
Difficulty  is  the  average  percentage  of  HOPT  steps  done  correctly  for  a 
task,  or  the  average  percentage  of  examinees  responding  correctly  to  the 
matching  items.  The  difficulty  component  will  be  positive  if  HOPT  tasks 
tend  to  be  easier  than  the  average  of  corresponding  JKT  items.  For 
Night  Vision  on  Form  A,  all  HOPT  tasks  were  easier  than  the  matched  JKT 
items,  so  the  difficulty  component  was  positive. 

In  table  4,  the  "error"  component  indicates  whether  items 
discrepant  in  difficulty  were  also  different  in  proportion  sampled.  In 
practice,  this  component  is  indistinguishable  from  error.  It  will  be 
large  and  negative  if  the  biggest  difficulty  differences  were  for  tasks 
that  also  had  substantial  sampling  discrepancies,  and  the  sign  of  the 
discrepancies  match.  For  Communications  on  Form  A,  the  error  component 
is  large  because  the  two  easiest  tasks  (with  large  positive  difficulty 
discrepancies)  also  had  no  matching  items  (i.e.,  a  substantial  positive 
sampling  difference) . 

In  general,  table  4  indicates  that  difficulty  differences  are 
responsible  for  the  single  largest  amount  of  HOPT- JKT  discrepancies- - 
i.e.,  item  difficulties  were  different  from  the  tasks  they  were  supposed 
to  represent.  When  the  absolute  values  of  the  "sampling,”  "difficulty," 
and  "error"  rows  were  added,  an  average  of  49.6  percent  of  the  absolute 
discrepancies  were  accounted  for  by  the  difficulty  factor.  Sampling  and 
interaction  accounted  for  17.0  percent  and  33.4  percent  of  the 
discrepancies,  respectively. 

In  a  few  duty  areas,  the  error  component  had  a  large  impact, 
counteracting  otherwise  large  HOPT-JKT  discrepancies.  For  Form  A, 
Communications,  NBC,  and  Security  and  Intelligence  would  have  been  much 
more  discrepant  had  it  not  been  for  a  large  amount  of  error.  For  Form 
B,  error  counteracted  discrepancies  for  Night  Vision,  and  Security  and 
Intelligence . 
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Table  4.  Results  of  sample -difficulty  analyses  of  JKT  differences  from 
the  HOPTs 


Sampling 

Difficulty 

Error 

Total 

Component 

Component 

Component 

Discrepancy 

Form  A 


Communications 

+  .19 

+  .22 

-.30 

+  .11 

First  aid 

+  .02 

+  .22 

-.06 

+  .18 

Grenade  launcher 

+  .01 

+  .07 

-.01 

+  .07 

Hand  grenades 

0.00 

+  .03 

0.00 

+  .03 

LAW 

+  .02 

+  .11 

-.02 

+  .11 

Land  navigation 

+  .03 

+  .10 

-.10 

+  .03 

Mines 

+  .04 

+  .01 

+  .01 

+  .06 

NBC 

+  .07 

+  .30 

-.26 

+  .11 

Night  vision 

+  .02 

+  .23 

-.02 

+  .23 

SAW 

-.01 

+  .15 

-.12 

+  .04 

Security  &  Intel . 

+  .03 

+  .25 

-.17 

+  .11 

Tactical  Measures 

-.09 

+  .19 

0.00 

+  .10 

Form  B 

Communications 

+  .15 

+  .12 

-  .18 

+  .09 

First  aid 

+  .05 

+  .07 

-.08 

+  .04 

Grenade  launcher 

+  .01 

+  .08 

0.00 

+  .09 

Hand  grenades 

0.00 

-.01 

0.00 

-.01 

LAW 

+  .03 

+  .11 

-  .02 

+  .12 

Land  navigation 

+  .13 

+  .18 

-.26 

+  .05 

Mines 

+  .02 

-.13 

-.01 

-.12 

NBC 

0.00 

+  .16 

-.17 

-  .01 

Night  vision 

+  .01 

+  .51 

-.33 

+  .19 

SAW 

0.00 

+  .20 

-.13 

+  .07 

Security  &  Intel. 

+  .30 

+  .20 

-  .34 

+  .16 

Tactical  Measures 

-.09 

+  .19 

0.00 

+  .10 

TOTAL  ABSOLUTE 

1.32 

3.84 

2.59 

7.75 

(17.0%) 

(49.6%) 

(33.4%) 

(100.0%) 

NOTE:  Positive  values  indicate  that  the  respective  component  (i.e., 

factor)  results  in  the  total  hands-on  average  being  higher  than 
the  total  j ob -knowledge  average;  negative  values  indicate  that 
the  component  makes  the  HOPT  average  lower.  The  sampling 
component  will  be  positive  if  the  easier  HOPT  tasks  constitute 
a  larger  proportion  of  the  HOPT  total  score  than  the 
corresponding  JKT  items.  The  difficulty  component  is  positive 
if  HOPT  tasks  are  easier  than  the  average  of  corresponding 
items.  The  error  component  includes  the  systematic  interac¬ 
tion  of  sampling  with  difficulty  and  random  error.  It  will  be 
large  and  negative  if  the  biggest  difficulty  differences  were  for 
tasks  that  also  had  substantial  sampling  discrepancies,  and  the 
signs  of  the  discrepancies  match. 
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Item-Level  Discrepancy  Analyses 


S-D  analyses,  described  above,  compared  average  item  difficulties 
to  HOPT  percentage  correct  at  the  duty-area  Level.  This  section 
presents  item-level  discrepancy  analyses.  Since  the  difficulty 
component  was  found  to  account  for  the  largest  proportion  of  duty- area 
HOPT-JKT  discrepancies,  additional  analyses  were  conducted  to  account 
for  why  these  occur.  To  supplement  the  S-D  analyses  of  average  item 
difficulties  just  described,  single  item-level  discrepancy  analyses  were 
conducted  using  the  crosswalk  described  above  and  shown  in  appendix  D. 

For  each  JKT  item,  the  average  percentage  of  steps  correct  for  each 
HOPT  task  was  subtracted  from  the  corresponding  JKT  item  percentage 
correct.  Positive  discrepancy  values  therefore  refer  to  cases  in  which 
the  HOPT  task  was  easier  than  the  corresponding  JKT  item.  Negative 
values  indicate  cases  where  the  HOPT  was  more  difficult  than  the  job- 
knowledge  item.  Appendix  E  lists  the  items  from  lowest  to  highest 
discrepancy  values . 

Figure  3  illustrates  the  results  of  plotting  task  percentage  of 
steps  correct  (values  are  labeled  along  the  y-axis)  versus  the  average 
percentage  of  examinees  responding  correctly  to  the  item  (values  along 
the  x-axis).  Tasks  lying  along  the  diagonal  line  were  of  the  same 
difficulty  as  the  corresponding  item.  Distance  from  the  diagonal  line 
indicates  the  degree  of  item- task  discrepancy.  Tasks  above  the  line 
were  easier  than  the  corresponding  item,  and  those  below  the  line  were 
more  difficult  than  the  matched  item. 

Figure  3  indicates  that  there  is  considerable  variation  not 
accounted  for  by  duty  areas.  For  example,  although  the  average  discrep¬ 
ancy  for  the  land  navigation  area  is  quite  low,  several  items  concerning 
determination  of  location  by  map -terrain  association  and  setting  an 
azimuth  at  night  was  considerably  easier  than  the  performance  of 
corresponding  HOPT.  Similarly,  although  JKT  performance  in  the 
communications  duty  area  was  generally  worse  than  the  performance  of 
corresponding  HOPT  tasks,  Items  for  one  task,  constructing  a  field- 
expedient  antenna,  were  much  easier  than  the  HOPT. 

Substantive  Analyses  of  Item-Task  Differences 

The  analyses  so  far  indicate  that  difficulty  is  a  more  important 
factor  than  sampling  problems  in  determining  HOPT-JKT  discrepancies. 

The  following  section  explores  three  reasons  for  these  discrepancies  in 
item  difficulty: 

•  The  item  writing  was  misleading  or  the  item  context  was 
missing. 

•  The  wrong  skill  is  measured  by  the  item. 

•  The  item  was  written  for  a  skill  level  different  from 
that  possessed  by  the  examinees. 
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Figure  3.  Plot  of  HOPT  percentage  correct  vs.  JKT  item  percentage  correct 


Misleading  or  Unclear  Context 

It  is  expected  that  even  if  Marines  do  not  know  the  correct  answer, 
they  will  guess  the  correct  answer  roughly  25  percent  of  the  time  for  a 
four-alternative  test.  If  the  percentage  of  correct  responses  is 
considerably  below  the  chance  level,  it  is  suspected  that  the  item  might 
have  been  misleading  in  some  way.  Appendix  F  lists  items  that  had  less 
than  a  chance  level  of  correct  responses  (25  percent)  for  either  Form  A 
or  Form  B.  Three  items  stand  out  as  possibly  misleading:  LAW1,  TM17 , 
and  FA4. 

LAW1,  answered  correctly  9.4  percent  of  the  time  for  Form  A,  asked 
what  you  should  do  first  if  the  LAW  does  not  fire.  The  correct  answer 
was  "squeeze  the  trigger,"  but  the  alternative  "wait  10  seconds  and  fire 
again"  was  considerably  more  popular.  In  fact,  waiting  10  seconds  and 
firing  again  is  the  second  thing  that  should  only  be  done- -it  should 
only  be  performed  if  resqueezing  the  trigger  does  not  work.  In  this 
case,  Marines  might  have  assumed  that  the  trigger  had  been  resqueezed, 
so  the  context  of  the  question  was  unclear.  This  shows  one  danger  of 
using  a  job-knowledge  test  for  measuring  all  aspects  of  a  job- -the 
paper-and-pencil  item  asked  for  the  Marine  to  think  about  a  reflex 
response.  It  is  inadvisable  to  use  multiple -choice  items  to  assess 
reflexes . 

TM17,  answered  correctly  13  percent  of  the  time  for  Form  A,  was 
written  to  reflect  the  task  "Control  Unit  When  Not  In  Contact."  This 
question  asked  when  the  Marine  should  use  an  alternate  route  from  an 

objective,  with  two  of  the  answer  choices  being  (c)  as  needed,  to  avoid 

contact  with  the  enemy,  and  (d)  when  you  used  the  primary  route  to  reach 
the  objective.  In  this  case,  the  correct  answer  was  "d."  This  question 
was  somewhat  misleading  because  the  purpose  of  using  the  alternate  route 
is  to  avoid  contact  with  the  enemy--so  "c"  was  very  often  chosen.  In 
this  sense,  the  response  choices  were  not  mutually  exclusive. 

A  third  example  was  CPR  (FA4) .  This  item  asked,  "When  giving  CPR, 
how  often  should  you  check  for  breathing  and  pulse?"  Choices  were  (a) 
after  each  compression/cycle,  (b)  after  every  two  cycles,  (c)  after 
every  three  cycles  and  (d)  after  every  four  cycles.  The  correct  answer 
was  after  every  four  cycles.  In  this  case,  it  seems  like  a  fair 
question,  although  Marines  apparently  did  not  know  the  answer- -it  was 

answered  correctly  only  13  percent  of  the  time  for  Form  A. 

Low  item- total  correlations  also  indicate  misleading  items,  because 
low  correlations  indicate  that  those  who  did  best  overall  were  choosing 
an  alternative  other  than  the  one  on  the  answer  key.  Table  5  shows 
items  that  had  low  correlations  with  total  score.  It  is  noteworthy  that 
the  proportion  of  successful  responses  for  these  items  was  nearly  always 
at  the  chance  level- -10  of  the  items  were  also  in  the  list  of  items  with 
lowest  frequency  of  correct  responses. 
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Table  5.  Items  with  low  item- total  correlations 


Form  A 

Form  B 

Mean 

Mean 

Proportion 

Proportion 

Item 

Correlation 

Correct 

Item 

Correlation 

Correct 

FA5 

.03 

.22 

LAW1 

.00 

.08 

GL3 

.05 

.40 

LAW14 

-  .04 

.18 

GL9 

.03 

.31 

LN10 

-.02 

.21 

LAW1 

.05 

.09 

MI4B 

-  .05 

.18 

LAW14 

-.05 

.20 

MI6B 

.05 

.28 

LN6 

.01 

.20 

TMll 

.01 

.20 

NBC  IB 

-  .07 

.20 

TM12 

.01 

.26 

TM11 

.01 

.21 

TM14 

-  .04 

.28 

TM12 

-  .04 

.26 

TM16 

.02 

.26 

TM17 

.03 

.13 

TM22 

.02 

.27 

TM18 

03 

.28 

CM13 

.04 

.20 

TM35 

.05 

.36 

CM14 

.01 

.35 

CM10 

.02 

.26 

CM13 

-  .03 

.21 

CM14 

-  .03 

.35 

Table  6  shows  that  when  low- correlation  items  were  deleted,  the 
correspondence  between  the  JKT  and  the  HOPT  increased.  For  Form  A,  the 
correspondence  increased  more  than  44  percent  for  those  duty  areas  that 
were  changed,  and  20  percent  overall.  For  Form  B,  the  improvement  was 
27  percent  for  duty  areas  that  were  changed,  and  14  percent  overall. 

Items  representing  the  ability  to  work  on  mines  with  tripwires 
(Form  B)  were  exceptions  to  tne  rule  that  deleting  low-correlation  items 
improved  HOPT- JKT  correspondence.  This  was  apparently  because,  in  fact, 
Marines  had  little  proficiency  in  this  hands-on  skill.  This  anomaly 
indicates  that  multiple-choice  items  are  inadequate  for  measuring  pro¬ 
ficiency  in  performing  extremely  difficult  tasks,  because  random 
guessing  provides  a  lower  bound  on  the  proportion  of  correct 
responses.  In  all  other  instances,  deleting  low-correlation  items 
increased  the  correspondence  between  JKTs  and  HOPTs. 
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Table  6.  HOPT-JKT  discrepancy  values  after  low  item— total  correlation  items  are  omitted 
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The  Wrong  Skill  Level  Is  Measured 

Measuring  a  different  skill  level  was  a  common  reason  for  an  item 
to  be  discrepant  from  the  HOPT.  For  example,  items  measuring  complex 
procedural  tasks  were  often  too  easy.  For  constructing  a  field- 
expedient  antenna,  the  HOPT  mean  was  12  percent,  but  the  item  means  were 
generally  much  higher.  The  actual  HOPT  task  was  complex  (appendix  F) , 
requiring  the  ability  to  read  detailed  directions,  visualize  performance 
steps,  organize  multiple  pieces  of  equipment,  and  demonstrate  a  moderate 
amount  of  manual  dexterity.  There  are  a  number  of  ways  performance 
steps  could  be  misinterpreted,  even  with  help  from  the  manual.  (All 
Marines  were  allowed  to  use  manuals  to  assist  with  this  task) .  In 
contrast,  the  corresponding  item,  CM9  (mean  62.8  percent),  required  no 
more  than  the  ability  to  read  a  chart. 

CM9 ,  meant  to  measure  the  ability  to  construct  a  field-expedient 
antenna,  could  be  improved  if  it  focused  on  a  more  appropriate  step. 

This  item  focused  on  a  step  that  was  probably  one  of  the  least  difficult 
in  the  entire  task.  Furthermore,  an  approach  using  fill-ins  for  "what 
would  you  do  next"  would  come  closer  to  matching  the  difficulty  of  the 
actual  performance  test. 

Items  measuring  skill  at  complex  reasoning  tasks  were  also  often 
too  easy.  For  example,  setting  an  azimuth  at  night  requires  the  ability 
to  follow  complex  instructions,  visualize,  and  make  independent 
judgments.  On  the  other  hand,  a  corresponding  item  stressed  simple 
memory  of  definition  of  a  back  azimuth,  resulting  in  a  comparatively 
high  pass  rate  (54.1  percent  and  51.9  percent.  It  is  apparently  much 
more  difficult  to  set  an  azimuth  at  night  than  to  remember  the 
definition  of  a  back  azimuth. 

Better  task  analysis  could  have  lessened  the  problems  involved  with 
measuring  complex  procedural  and  reasoning  tasks.  In  these  cases, 
dis tractors  might  have  pinpointed  common  misunderstandings  that  can 
interfere  with  hands-on  performance. 

When  the  wrong  skill  is  measured  by  an  item,  the  result  could  also 
be  that  the  JKT  is  too  difficult.  A  striking  example  of  this  was  for 
night-vision  items,  which  were  much  more  difficult  than  the  correspond¬ 
ing  HOPT  tasks.  Several  night-vision  tasks  were  relatively 
straightforward  and  procedural,  resulting  in  a  moderate  passing  rate  of 
52  percent  for  the  "clean  components"  task.  In  contrast,  the 
corresponding  JKT  items  were  overly  detailed  and  asked  for  relatively 
unimportant  information  that  a  Marine  was  unlikely  to  remember.  A 
night-vision  "clean  components"  item  on  the  job-knowledge  test,  with  a 
26  percent  passing  rate,  illustrates  these  more  difficult  questions  as 
in  the  following  example: 
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Item  2:  What  should  you  use  to  clean  the  rubber  eyeshield  on  the 
AN/PVS-4  night  sight? 


A. 

Wet  cloth 

B. 

Lint-free  cloth 

C. 

Alcohol 

m 

D. 

Soft  brush. 

In  this  case,  the  item  asks  the  Marine  to  make  distinctions  among 
cleaning  tools  that  could  be  looked  up  in  the  manual.  The  item  makes 
the  task  too  difficult  by  asking  for  information  that  is  not  needed  for 
successful  task  performance. 

Items  measuring  time-critical  tasks  were  sometimes  too  difficult. 

As  described  above,  the  LAW  item  asking  Marines  to  think  about  a  reflex 
action  was  too  difficult.  Tactical  measures  was  another  area  in  which 
JKT  items  were  often  too  difficult  compared  to  HOPT  performance.  TM 
item  17  was  difficult  (item  mean  13  percent,  task  mean  76.8  percent). 
This  item  asks  for  information  about  tactics  that  a  first- term  Marine 
would  not  be  required  to  know.  In  addition,  the  item  is  not  good  as  a 
surrogate  because  it  measures  reading  comprehension  and  carefulness:  the 
preposition  "from"  in  the  item  stem  ("from  an  objective")  is  crucial  to 
understanding  that  the  correct  answer  is  D.  This  item  is  therefore  a 
poor  surrogate  because,  to  a  large  extent,  it  tests  reading  compre¬ 
hension  rather  than  understanding  of  tactical  measures.  Furthermore, 
someone  could  have  had  partial  understanding  that  Is  not  given  credit  in 
the  following  "all  or  none"  example: 

Item  17:  You  have  completed  a  detailed  patrol  plan,  and  selected 
an  alternate  route  from  an  objective.  You  should  use  that 
alternate  route 

A.  only  when  the  patrol  made  contact  with  the  enemy  on  the 
primary  route 

B.  only  when  the  patrol  leader  suspects  that  the  patrol  has 
been  detected 

C.  as  needed,  to  avoid  contact  with  the  enemy 

D.  when  you  used  the  primary  route  to  reach  the  objective. 
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A  Non-Matching  Skill  Level  Is  Measured 

Many  tactical  measures  (TM)  items  measured  skill  levels  that  were 
more  advanced  than  those  yet  acquired  by  most  examinees.  The  TM  tasks 
often  involved  knowing  instructions  (e.g.,  hand  signals  for  various 
formations)  that  first- term  Marines  would  know,  but  items  focused  more 
on  knowing  the  conditions  under  which  certain  formations  should  be  used 
--which  is  material  unfamiliar  to  first-term  Marines. 

TM  item  34  (Control  Unit  Movement  When  Not  In  Contact)  also 
measured  a  non-matching  skill  level.  The  item  used  a  system  of  markings 
that  are  unfamiliar  to  most  first-term  infantrymen  (figure  4) .  The  HOPT 
task  (appendix  H)  gives  the  infantryman  more  information  about  the 
tactical  situation,  showing  the  situation  on  a  map,  whereas  the  item 
gives  only  the  ambiguous  information  "you  want  to  advance  rapidly  across 
a  danger  area  against  a  known  enemy  position."  The  HOPT  task  is 
comparatively  easy  because  it  asks  a  series  of  questions  familiar  to 
what  first- term  Marines  should  know  (e.g.,  proper  hand  signals  for 
various  formations) ,  whereas  the  item  focuses  on  knowledge  that  would  be 
available  mostly  to  squad  leaders  (e.g.,  which  formation  to  use).  Note 
that  the  HOPT  task  allows  for  partial  knowledge  (by  scoring  some  steps 
correctly  and  others  incorrectly) ,  whereas  the  JKT  item  is  "all  or 
none."  Lastly,  note  that  the  JKT  item  does  not  make  clear  in  which 
direction  the  squad  leader  should  move  his  troops,  therefore,  important 
context  is  missing. 

CONCLUSIONS 

Implications  for  Development  of  Better  Job-Knowledge  Tests 

Better  Control  of  Item  Sampling 

An  immediate  implication  of  this  research  is  that  job-knowledge 
tests  should  more  adequately  sample  the  domain,  sometimes  with  more 
items.  This  research  showed  that  poor  item  sampling  or  the  interaction 
of  sampling  with  difficulty  was  responsible  for  about  50  percent  of  the 
discrepancies  between  JKT  and  HOPT  duty-area  totals.  To  the  extent 
possible,  the  proportion  of  items  for  a  duty  area  should  reflect  the 
proportion  of  tasks  performed  in  the  duty  area.  Otherwise,  JKT  and  HOPT 
averages  might  be  discrepant  solely  because  the  balance  of  JKT  items 
reflects  tasks  that  are  unrepresentative  of  the  duty  area  as  a  whole. 


1.  Some  JKT  items  and  HOPT  tasks  were  purposefully  developed  to  deal 
with  content  relatively  unfamiliar  to  most  first- terra  Marines.  This  was 
done  to  determine  the  ability  of  Marines  to  respond  if  unfamiliar 
leadership  roles  were  thrust  upon  them  during  combat.  Although  these 
items  and  tasks  were  purposefully  developed,  they  sometimes  increased 
HOPT-JKT  discrepancies. 
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You  want  to  advance  rapidly  across  a  danger  area,  against  a  known 
enemy  position.  As  squad  leader,  which  squad  combat  formation 
should  you  set  up? 
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Figure  4.  Tactical  Measures  Item  34,  Control  Unit  Movement  When 
Not  in  Contact 


To  correct  the  problem  of  poor  item  sampling,  content  domain 
specifications  need  to  be  made  more  explicitly,  taking  care  to  give 
approximately  the  same  proportion  of  items  as  there  are  tasks  in  the 
content  domain.  Sometimes  this  will  require  that  a  larger  number  of 
items  be  written.  It  would  be  relatively  simple  to  keep  track  of  the 
optimum  number  of  items  per  task  given  information  on  number  of  tasks 
and  constraints  on  the  total  number  of  items  allowed.  For  example, 
Communications  for  Form  A  had  the  following  proportions  of  tasks  and 
items : 


HOPT 

Tasks 

JKT  Items 

Discreoancv 

No. 

Fraction 

No. 

.  Fraction 

Operations  inspection 

1 

(.1111) 

2 

(.1429) 

- .0318 

Visual  inspection 

1 

(.1111) 

1 

(.0714) 

.0397 

Operate  AN/PRC- 77 

1 

(.1111) 

0 

(.0000) 

.1111 

Assemble  radio  AN/PRC-77 

1 

(.1111) 

1 

(.0714) 

.0397 

Take  immediate  action 
Construct  field  expedient 

0 

(.0000) 

1 

(.0714) 

- ,9614 

antenna 

1 

(.1111) 

4 

(.2857) 

-  .1746 

Install  telephone  set 

1 

(.1111) 

2 

(.1429) 

-  .0318 

Repair  wire  of  TA-312 

1 

(.1111) 

1 

(.0714) 

.0397 

Operate  TA-312 

1 

(.1111) 

0 

( .1429) 

.1318 

Check  parts 

1 

(.1111) 

0 

( .0000) 

.1111 

Total 

9 

1.0000 

14 

(1.0000) 
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If  the  14  items  had  been  distributed  so  that  "Construct  Field 
Antenna"  had  fewer  items,  and  at  least  one  item  had  been  written  for 
each  task,  sampling  would  have  improved  considerably.  The  discrepancy 
would  have  decreased  further  if  the  item  measuring  "Take  Immediate 
Action"  had  been  deleted,  since  there  was  no  corresponding  task. 

Quality  Control  Procedures  for  Items 

Another  important  step  in  creating  a  better  job-knowledge  test 
involves  more  complete  integration  of  task  analyses  with  item  develop¬ 
ment.  Item  writers  should  think  about  which  steps  are  likely  to  be 
performed  incorrectly  for  lack  of  knowledge,  and  write  items  for  those 
steps.  Ideally,  distractors  would  include  common  errors  made  by 
Marines.  Item  writers  should  write  reasoning  items  for  reasoning  tasks, 
and  not  simplify  an  unstructured  task  into  a  simple  choice  of  alterna¬ 
tives  (as  was  done  for  setting  an  azimuth  at  night  and  determining  grid 
coordinates) .  Item  writers  should  avoid  contextual  ambiguity  in  their 
items  and  should  be  careful  not  to  write  items  that  stress  mainly  read¬ 
ing  ability,  or  which  involve  overly  detailed  knowledge  that  the  Marine 
is  likely  to  look  up  in  a  manual  (e.g.,  cleaning  night-vision 
equipment) .  Appendix  I  provides  a  group  of  questions  that  could  be  used 
for  item  review. 

Try-out  testing  is  another  important  part  of  improving  item 
quality.  More  care  needs  to  be  taken  to  delete  items  that  have  low 
item- total  correlations.  Improvements  can  be  dramatic  if  such  items  are 
deleted.  To  delete  items  that  have  low  item-total  correlations,  it 
would  be  necessary  to  "trial  test"  the  job-  knowledge  test,  compute  the 
correlations,  delete  the  low  correlation  items,  and  "trial  test"  the 
remaining  items  a  second  time.  Results  from  the  second  trial  test 
should  confirm  that  all  poor  items  have  been  eliminated. 

Alternate  Formats  for  Job -Knowledge  Tests 

For  skills  that  will  be  hard  to  measure  with  a  traditional 
multiple-choice  job-knowledge  test,  a  variety  of  alternative  formats  are 
available  (table  7).  Haladyna  [11]  suggests  these  alternatives  because 
they  can  require  complex  thinking  without  introducing  irrelevant 
response  alternatives.  Some  research  has  shown  that,  in  practical  test 
writing,  only  one  or  two  distractors  carry  most  of  the  burden. 

Alternate  choice  (table  7)  is  useful  for  avoiding  overly-detailed 
distractors  and  decreasing  the  reading  ability  needed  to  answer  the 
question.  Multiple  true- false  and  testlets,  also  shown  in  table  7,  can 
simulate  some  of  the  complexity  of  lengthy  procedures.  Fill-in  and 
essay  questions  are  also  alternatives  to  traditional  multiple  choice. 

If  a  job-knowledge  item  is  intended  to  measure  ability  to  perform 
difficult  procedural  or  reasoning  tasks,  use  of  question  formats  such  as 
"What  is  the  correct  way  to...?,"  "Which  is  the  most  important?"  and 
"What  would  happen  if...?"  might  also  improve  item  validity  (Table  8, 
from  Haladyna,  11). 


Table  7.  Examples  of  nontraditional  job-knowledge  test  item 
formats 


Alternate  Choice 

You  have  completed  a  detailed  patrol  plan  and  selected  an  alternate 
route  from  an  objective.  Under  which  condition  should  you  use  that 
alternate  route? 

1.  To  avoid  contact  with  the  enemy 

2.  When  you  used  the  primary  route  to  reach  the  objective. 

Multiple  True-False 

You  have  completed  a  detailed  patrol  plan  and  selected  an  alternate 
route  from  an  objective.  Which  of  the  following  are  conditions 
under  which  you  should  use  that  alternate  route? 

1.  When  the  patrol  made  contact  with  the  enemy  on  the  primary 
route 

2.  When  ,  the  patrol  leader  suspects  that  the  patrol  has  been 
detected 

3.  To  avoid  contact  with  the  enemy 

4.  When  you  used  the  primary  route  to  reach  the  objective. 

Item  Set  (Testlet) 

Suppose  you  have  to  set  an  azimuth  of  45  degrees  at  night.  You 
have  rotated  the  bezel  ring  until  the  luminous  line  is  directly 
over  the  black  index  line. 

1.  What  should  now  be  rotated? 

a.  Bezel  ring  b.  yourself 

2.  In  what  direction  should  rotation  be  accomplished? 

a.  clockwise  b.  counter-clockwise 

3.  How  much  rotation  is  needed? 

a.  9  clicks  b.  15  clicks 
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Table  8.  Examples  of  generic  item  shells  for  complex  tasks 


Applying 

Predicting 

What  is  the  correct  way  to ... . 
Background  is  given, 
o  What  is  the  problem? 
o  What  is  the  solution  to 
the  problem? 

o  How  should  the  problem  be  solved? 

What  would  happen  if. . .? 
When. . . ,  what  happens? 
Under  what  circumstances 
would  you  expect. . . .? 

IMPLICATIONS  FOR  WHEN  A  JKT  IS  APPROPRIATE 

If  attempts  with  nontraditional  item  formats  are  unsatisfactory, 
alternatives  to  a  job -knowledge  test  should  be  sought.  Physical 
coordination  tasks  such  as  firing  a  rifle  and  reflexive,  time-critical 
tasks  such  as  taking  immediate  action  are  very  difficult  to  measure 
validly  with  a  JKT.  Analysis  of  the  50  largest  item- task  discrepancies 
from  appendix  E  suggests  that  complex  reasoning  tasks,  such  as  setting 
an  azimuth  at  night,  and  highly  procedural  tasks,  such  as  constructing  a 
field-expedient  antenna,  are  also  difficult  to  measure  with  a  job- 
knowledge  test.  In  summary,  the  following  types  of  tasks  might  be 
inappropriate  to  test  using  a  paper-and-pencil  measure: 


•  Physical  coordination  tasks,  such  as  firing  a  rifle  at 
pop-up  targets  are  impossible  to  measure  with  a 
job-knowledge  test  [5], 

•  Time-critical ,  reflex  tasks,  such  as  responding  to  a  LAW 
that  does  not  fire  might  be  impossible  to  measure  with  a 
job -knowledge  test. 

•  Complex  procedural  tasks  that  require  multiple  steps  and 
the  ability  to  visualize  the  completed  project,  such  as 
construction  of  a  field-expedient  antenna,  were  also 
difficult  to  test  using  multiple-choice  tests. 

•  Complex  reasoning  tasks  that  require  spatial  orientation, 
such  as  setting  an  azimuth  at  night,  were  difficult  to 
test  using  multiple -choice  tests. 
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In  contrast,  the  following  tasks  require  skills  for  which  testing  using 
a  JKT  is  highly  appropriate: 


•  Knowledge-driven  Casks  that  require  memory  for  specific 
facts  and  attention  to  detail  among  complex  alternatives. 
These  tasks  require  knowledge,  but  it  is  helpful  not  to 
have  to  look  the  information  up  in  a  textbook. 

•  Reading -dependent  Casks  in  which  the  JKT  is  a  close 
approximation  to  the  form  in  which  actual  job  performance 
is  required.  Skills  in  using  a  technical  manual  are 
especially  appropriate  for  a  JKT. 

•  Time- independent  Casks  in  which  the  actual  job 
performance  allows  sufficient  time  to  recall  information; 
some  simple  maintenance  tasks  might  be  fairly  time- 
independent . 
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DETAIL  OF  HOPT  DEVELOPMENT  PROCEDURES  AND  RELIABILITY 


Development 

The  first  task  in  developing  job-performance  measures  was  to  define 
the  requirements  of  Marine  Corps  enlisted  infantrymen  for  each  Military 
Occupational  Specialty  (MOS) .  This  was  essential  to  ensure  that  tasks 
would  be  selected  to  maximize  the  coverage  of  job  behaviors  so  that 
hands-on  test  scores  would  generalize  to  the  full  range  of  infantry  job 
requirements . 

Extensive  job  analyses  using  Marine  Corps  manuals  on  the 
performance  of  job  tasks  were  conducted  to  define  hands-on  performance 
content.  Matrices  of  task  and  skill  requirements  for  each  MOS  were 
developed  and  reviewed  by  Marine  Corps  job  experts.  Test  content  was 
randomly  selected  from  these  matrices  so  that  scores  could  be 
generalized  to  the  full  performance  domain. 

Job  tasks  were  organized  into  duty  areas,  and  each  duty  area  was 
covered  by  one  or  more  performance  tests.  The  number  of  tasks  performed 
for  each  of  the  basic  infantryman  duty  areas  varied  between  one  (for 
hand  grenades)  to  nine  (for  land  navigation). 

Second,  job  experts  identified  specific  skills  and  knowledge 
required  to  perform  the  job  of  each  MOS.  Thus,  underlying  skills  and 
knowledge  common  across  MOSs  were  made  explicit.  This  procedure  allowed 
common  and  unique  skills  and  knowledge  to  be  sampled  across  MOSs. 

Hands-on  performance  tests  (HOPTs)  were  developed  for  the  selected 
test  content.  These  HOPTs  were  reviewed  by  Marine  Corps  job  experts. 

The  tests  were  then  trial- tested  and  improved  before  a  large-scale 
tryout  was  conducted  with  more  than  200  Marines.  Test  tryout  and  test- 
administrator  training  were  conducted  during  the  first  two  weeks  of 
August  1987.  A  full  two-week  training  period  for  test  administrators 
was  conducted  because  of  the  critical  nature  of  the  scorers'  grading 
j  udgments . 

Detailed  task  analyses  of  the  selected  test  content  were  then 
conducted  to  identify  the  specific  steps  required  to  perform  each 
task.  Job  experts  and  job  incumbents  reviewed  the  task  analyses  to 
confirm  their  validity  as  accurate  descriptions  of  how  tasks  were 
actually  performed. 

Each  task  required  the  infantryman  to  perform  a  series  of  steps 
that  would  be  scored  either  "go"  or  "no-go."  Some  tasks  had  as  few  as  2 
steps  and  others  as  many  of  37,  but  most  tasks  contained  approximately 
10  steps. 
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Two  testing  forms  were  developed  for  each  MOS .  The  number  of  tasks 
given  each  participant  ranged  from  68  to  71  (for  riflemen) ,  70  to  72 
(for  machine  gunners),  72  to  75  (for  mortarmen) ,  and  76  to  80  (for 
assaultmen) . 

Trial  tests  of  representative  tasks  for  the  rifleman  specialty  (MOS 
0311)  were  administered  to  more  than  200  Marines,  to  ensure  that  tasks 
could  be  completed  and  scored  under  actual  test  conditions.  The  tryout 
was  also  used  to  train  the  test  administrators  to  achieve  and  maintain 
equivalent  scoring  standards  across  testing  situations.  Tryout  of  the 
tests  immediately  followed  administrator  training. 

The  most  critical  component  of  hands-on  performance  measurement  is 
the  test  administrator.  Unlike  paper-and-pencil  tests  in  which  reliable 
and  objective  scoring  keys  are  easily  applied,  hands-on  administrators 
must  observe  and  make  judgments  concerning  whether  individuals  performed 
each  step  correctly.  Former  Marines  were  hired  to  serve  as  test 
administrators  because  of  their  fsmiliarity  with  the  test  content, 
knowledge  of  the  Marine  Corps,  and  ability  to  work  well  with  young 
Marines.  Because  test  administrators  were  retired,  they  did  not  have  a 
vested  interest  in  coaching  or  scoring  some  Marines  more  leniently. 

To  ensure  comparability  of  hands-on  scoring  across  testing  loca¬ 
tions,  detailed  training  manuals  were  prepared,  and  the  same  testing 
team  conducted  the  training  at  each  base.  To  monitor  the  scoring 
accuracy  .and  consistency  of  the  test  administrators,  daily  quality 
control  checks  were  imnlemented.  Hands-on  data  were  entered  into  a 
computer  daily  so  that  administrators  could  be  checked  for  leniency  or 
drift.  Immediate,  specific  feedback  was  given  to  test  administrators  if 
problems  were  detected.  To  assess  the  accuracy  of  hands-on  scoring, 
shadow  scoring  was  conducted  as  a  quality  control  on  a  regular  basis. 
Discrepancies  among  administrators'  scoring  were  discussed  and 
resolved.  Administrators  were  rotated  across  testing  stations  to 
minimize  systematic  error  and  to  increase  administrator  motivation  and 
attention. 

Reliability 

General izability  analyses  [9]  indicate  that  the  HOPT  used  in  this 
research  sampled  enough  tasks  to  have  a  relatively  high  G  coefficient  of 
.83  using  data  from  only  35  of  the  more  than  70  tasks  used  to  make  up 
the  HOPT.  The  major  sources  of  variation  with  these  data  concern  tasks; 
almost  negligible  error  is  associated  with  examiners  [9],  These 
analyses  ensure  that  the  procedures  used  to  develop  the  HOPT  were 
successful  in  creating  a  test  that  can  be  generalized  to  the  full  domain 
of  infantry  performance. 

Interrater  agreement,  measured  in  percentages,  is  the  number  of 
times  raters  agree  on  their  markings,  divided  by  the  total  number  of 
steps  marked.  Scorer  agreement  ranged  from  a  low  of  80  percent  to  a 
high  of  100  percent,  depending  on  the  task.  The  mean  interrater 
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agreement  was  90  percent.  Agreement  levels  for  the  four  MOSs  were  0.90, 
0.90,  0.89,  and  0.90  for  MOS  0311,  0331,  0341,  and  0351,  respectively. 
These  agreement  levels  compare  favorably  with  other  studies  of  hands-on 
performance . 

Test-retest  reliability  is  the  degree  to  which  people  score  the 
same  on  a  subsequent  test  administration.  The  reliability  of  the  hands- 
on  measures  was  tested  with  188  riflemen  (MOS  0311)  taking  the  opposite 
form  of  the  test  seven  to  ten  days  after  the  initial  administration. 

For  example,  a  rifleman  who  took  the  68 -item  Form  A  test  originally 
would  take  the  70- item  Form  B  test  seven  to  ten  days  later.  The 
correlation  (reliability  estimate)  between  the  two  administrations  was 
0.70.  Significant  carryover  effects  were  found:  there  was  an  average 
retest  gain  in  performance  of  more  than  0.8  standard  deviation. 

Internal  consistency  is  the  degree  to  which  different  items  on  a 
test  indicate  the  same  level  of  proficiency.  Cronbach  alpha 
coefficients,  which  are  estimates  of  internal  consistency,  were  computed 
for  each  MOS  and  each  test  form.  Alpha  coefficients  were  0.87  for  MOS 
0311  (n  -  1,067),  0.87  for  MOS  0331  (n  -  257),  0.88  for  MOS  0341  (n  - 
217),  and  0.83  for  MOS  0351  (n  —  239).  In  no  cases  did  alpha 
coefficients  for  alternate  forms  vary  by  more  than  0.02.  These  figures 
indicate  a  high  and  stable  degree  of  internal  consistency  for  the  hands- 
on  tests. 


Table  A-l.  Reliability  estimates  of  hands-on  test 


Reliability 


Test- 

Internal 

Scorer 

Retest 

Consistency 

Agreement 

Form  A 

0.88  (69  tasks) 

0.90 

Hands-on 

Form  B 

0.70 

0.86  (66  tasks) 

0.90 
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DETAILS  OF  JKT  DEVELOPMENT  PROCEDURES  AND  RELIABILITY 


Development 

The  paper-and-pencil  job-knowledge  tests  were  developed  to  parallel 
hands-on  content  as  much  as  possible.  Therefore,  hands-on  performance 
steps  were  used  as  a  basis  for  developing  the  job-knowledge  test, 
although  it  was  understood  that  the  time  allowed  for  hands-on  testing 
would  not  permit  all  hands-on  content  to  be  covered. 

Beginning  with  the  procedures  specified  in  the  hands-on  performance 
test,  critical  steps  were  identified  and  multiple-choice  questions  were 
written  concerning  those  steps.  The  items  stressed  what  and  how  steps 
are  performed  rather  than  why.  Whenever  possible,  illustrations  were 
used  to  maximize  the  fidelity  to  actual  performance  situations.  For 
tasks  that  were  more  cognitive  than  procedural  (e.g.,  tactical 
measures),  combat  scenarios  were  developed,  and  items  asked  what  should 
be  done  based  on  the  information  provided  in  the  scenarios . 

Items  were  reviewed  by  Marine  Corps  subject  experts,  then  by  test- 
development  experts,  for  psychometric  qualities.  Written  test  forms 
were  developed  corresponding  to  the  content  for  Form  A  and  Form  B  of  the 
general  infantry  hands-on  test  items.  Content  that  was  the  same  in  both 
hands-on  test  forms  corresponded  to  the  same  written  items.  Two  sets  of 
prospective  items  for  each  test  form  were  developed  and  labeled  A1 ,  A2 , 
Bl,  B2.  The  order  of  duty-area  presentation  was  varied  between  each 
version  of  Form  A  or  Form  B,  but  the  order  remained  the  same  within  each 
duty-area.  The  initial  draft  of  the  job-knowledge  test  contained  175 
items ,  with  the  expectation  that  information  from  the  tryout  would 
eliminate  some  items.  Only  one  written  test  form  was  developed  to 
parallel  each  of  the  MOS-specific  parts  of  the  hands-on  tests. 

The  draft  job-knowledge  test  questions  were  evaluated  using  active- 
duty  Marines.  Seventy-one  Marines  completed  175  items  for  the  general 
infantry  questions  and  42  specific  Rifleman  (0311)  items.  Two 
alternative  versions  of  the  general  infantry  test  were  administered. 
Completion  times  were  recorded,  and  item  analyses  pinpointed  items  that 
were  keyed  incorrectly  and  items  that  should  be  deleted.  Items  were 
dropped  that  had  high  correlations  of  distractors  with  total  test  score 
or  an  abnormally  low  pass  rate.  Either  of  these  conditions  indicates 
that  the  item  was  keyed  incorrectly  or  ambiguous  enough  that  the  more 
knowledgeable  Marines  did  not  do  better  on  the  item. 

The  final  general  infantry  (0300)  test  consisted  of  150  items,  to 
be  completed  in  90  minutes.  The  number  of  MOS-specific  items  varied 
from  40  to  50. 


B  -  1 


Reliability 


Test- retest  reliability ,  which  measures  the  correlation  between  two 
administrations  of  the  same  job-knowledge  test  administered  seven  to  ten 
days  apart  to  189  riflemen,  was  0.73  [10].  This  degree  of  reliability 
is  adequate,  but  not  particularly  high. 

Internal  consistency ,  estimated  by  Cronbach  alpha  coefficients, 
computed  for  the  job-knowledge  test,  was  0.89  for  MOS  0311  (199  items,  n 
-  1,296),  0.89  for  MOS  0331  (190  items,  n  -  306),  0.90  for  MOS  0341  (189 
items,  n  -  312),  and  0.87  for  MOS  0351  (190  items,  n  -  314).  The 
difference  between  alternate  test  forms  never  varied  by  more  than  0.02 
for  any  MOS.  These  figures  indicate  that  different  parts  of  the  job- 
knowledge  test  were  measuring  the  same  skills. 
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DESCRIPTION  OF  SAMPLE -DIFFICULTY  ANALYSES1 


Sample -difficulty  analysis  allows  decomposition  of  the  change  in 
HOPT-JKT  discrepancies  in  observed  percentage-correct  scores  by  use  of 

the  following  algebra:  Let  i  -  1 . n  denote  the  task,  and  j-1,2 

denote  the  testing  mode  (j**l  for  HOPT,  j-2  for  JKT) .  The  percentage  of 
correct  steps  for  a  given  HOPT  task  will  therefore  be  denoted  r^  and 
the  percentage  of  examinees  passing  the  set  of  JKT  items  representing 
that  task  will  be  denoted  r^-  Suppose  that  there  are  "n"  tasks  for  the 
duty  area  l,...,n,  with  the  percentage  of  tasks  within  the  duty  area 

represented  by  q-^ . ^nl*  Each  task  has  an  observed  percentage  of 

HOPT  steps  correct  r-^ . rnl'  ln  tllis  case,  the  percentage  of  HOPT 

steps  performed  correctly  for  the  duty  area,  P^OPT’  can  expressed  as 


Ji  'u-iii  -  ?aon  c> 

The  percentage  of  items  answered  correctly  is  similarly  computed  as 
the  weighted  average  of  percentage  answering  correctly  to  those  items 
that  represent  a  given  task  for  the  "n"  tasks .  The  percentage  of  JKT 
items  correctly  answered  for  the  duty  area,  Pjkt>  can  t>e  expressed  as 

ill  ri2^i2  -  PjKT  <2> 

The  difference  between  the  percentage  of  steps  performed  correctly 
within  the  duty  area  and  the  percentage  of  **"oms  answered  correctly  is 
^HOPT  "  ^JKT’  wlllch  can  be  expressed  as 

PHOPT  -  PJKT  -  121tll(q11-,12) 

+  illqil<ril-  ri2)  •  ill(ril  -  r12>(1il  •  112>  <3) 

The  first  term  of  this  expression  is  a  measure  of  the  change  in 
^  percentage  correct  that  would  have  been  expected  if  the  same  percentage 

of  items  were  answered  correctly  as  task  steps  (r^  unchanged) ,  but  the 
proportion  of  items  was  different  than  the  proportion  of  tasks  (q^  * 
q^) •  Calculation  of  this  "sampling"  term  yields  the  expected 
«  discrepancy  in  duty-area  average  because  of  differences  in  proportion  of 


1.  Based  on  work  of  Cook  [6], 
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items  compared  to  tasks.  The  remaining  two  terms  are  associated  with 
different  difficulty  levels  of  tasks  and  items  (r^i  -  r^)-  seconc* 

term  represents  "pure"  difficulty  differences,  whereas  the  third  term 
indicates  an  interaction  of  difficulty  with  sampling,  plus  error.  This 
third  term  is  called  the  error  component.  A  table  showing  an  example  of 
data  layout  and  computations  is  provided  in  table  C-l. 


Table  C-l.  Example  of  data  layout  for  sample -difficulty 
analyses  (night-vision  duty  area,  Form  A) 


HOPT 


JKT 


Average 

Percentage 

Percentage 

Percentage 

percentage 

of  tasks 

answering 

of 

of  steps 

relevant 

items 

performed 

(r<l) 

(<ln  1  ) 

items 

correctly 

(hi  9) 

Task  Name 

Visual 

inspection 

.679 

.3333 

.266 

.2500 

Operations 

inspection 

.653 

.3333 

.524 

.2500 

Clean 

corvorents 

.519 

.3333 

.382 

.5000 

NOTE:  For  the  night -vis ion  duty  area,  there  were  three  tasks 

(n  -  3)  and  four  items.  There  was  one  item  each  for  visual 
inspection  and  operations  inspection.  There  were  two  items 
for  clean  components. 


as 


The  sampling  component  for  the  night-vision  duty  area  is  computed 


ril(<Ul  *  «?i2>  ”  •  679( .  3333  -  .2500)  +  .653(.3333  -  .2500) 
+  . 519( . 3333  -  .5000) 

-  .02  (when  rounded  to  two  figures) 
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The  difficulty  component  for  night  vision  is  computed  as 

qil (ril  -  ri2)  -  . 3333( . 679  -  .266)  +  .3333(.653  -  .524) 

+  . 3333( . 519  -  .382) 

—  .23  (when  rounded  to  two  figures) 

Note  that  all  of  the  terms  are  positive,  indicating  that  all  shifts 
result  in  the  job-knowledge  items  being  more  difficult  than  the 
corresponding  HOPT  tasks.  Therefore,  the  sign  is  positive. 

The  final  "error"  term,  is  computed  as 

<ril  *  ri2><<lil  *  1i2>  "  (•  679-.  266)  ( .  3333-.  2500) 

+(. 653-. 524)(. 3333-. 2500) 

+(.519- .382) (.3333- .5000) 

—  .02  (when  rounded  to  two  figures) 

Note  that  this  term  is  subtracted  from  the  other  two  in  equation  (3).  A 
large  error  term  would  have  negated  the  main  effects  for  sampling  or 
difficulty. 

Taken  as  a  whole,  the  three  terms  indicate  that  the  difference  between 
task  and  item  difficulties,  rather  than  sampling  or  interaction,  is  the 
primary  reason  that  the  JKT  average  scores  were  lower  than  for  the  HOPT. 
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APPENDIX  F: 


LIST  OF  ITEMS  WITH  BELOW-CHANCE  LEVELS  OF  CORRECT  RESPONSES 


APPENDIX  F 


LIST  OF  ITEMS  WITH  BELOW- CHANCE  LEVELS  OF  CORRECT  RESPONSES 


Item  name 

Form 

A 

Form 

B 

Item 

mean 

HOPT 

mean 

Item 

mean 

HOPT 

mean 

Correct 

response 

Setting  Azimuth 

20.3 

24.9 

24.1 

27.4 

C 

at  Night(LN06) 
Determine  Azimuth 

19.8 

55.3 

15.7 

55.5 

D 

One  Pt  to  Another 
(LN24) 

Determine  Location 

23.9 

56.2 

27.1 

A 

by  Intersection 
(LN02) 

Measure  Distance 

20.5 

58.2 

21.0 

56.3 

A 

on  a  Map  (LN10) 
Establish  Landing 

21.3 

53.3 

19.6 

53.5 

C 

Zone  (TM11) 
Establish  Landing 

24.4 

53.3 

26.0 

53.5 

A 

Zone  (TM13) 

Control  Unit  Move¬ 

13.0 

76.8 

13.5 

76.7 

D 

ment  When  Not  in 
Contact  (TM17) 
Control  Unit  Move¬ 

25.2 

76.8 

24.2 

76.7 

D 

ment  When  Not  in 
Contact  (TM21) 
Control  Unit  Move¬ 

24.1 

76.8 

22.8 

76.7 

D 

ment  When  Not  in 
Contact  (TM34) 

Call  for  and 

20.1 

38.1 

24.1 

39.6 

C 

Adjust  Indirect 
Fire(TM32) 

Repair  Wire  of 

21.1 

18.4 

19.9 

18.9 

A 

TA-312  (CM13) 
Administer  Mouth- 

19.5 

62.9 

**** 

**** 

D 

to -Mouth  Resus- 
citation(FA7) 

CPR  (FAl) 

23.9 

40.5 

25.9 

43.2 

C 

CPR  (FA4) 

13.1 

40.5 

15.3 

43.2 

D 
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APPENDIX  F:  (Continued) 


Item  name 

Form  A 

Form  B 

Item 

HOPT 

Item 

HOPT 

Correct 

mean 

mean 

mean 

mean 

response 

Administer  First 
Aid  for  Abdominal 
Wound (FA5) 

21.9 

49.3 

**** 

**** 

A 

Prepare  to  Fire 
(LAW14) 

20.5 

56.9 

18.3 

55.8 

A 

Take  Immediate 
Action  (LAW1) 

9.4 

41.4 

7.7 

41.0 

D 

Drink  While  Masked 
(NBClA) 

20.1 

56.4 

**** 

**** 

C 

Inspect  and  Tag 
(SI12) 

19.6 

67.4 

17.9 

67.6 

D 

Confirm  Zero(GL12) 

21.7 

44.0 

19.7 

43.8 

D 

Install  Claymore 
Mine  with  Trip¬ 
wires  (MI04B) 

**** 

**** 

18.3 

21.1 

C 

NOTE:  Asterisks  indicate  that  the 

of  the  test  for  that  form. 

task  or 

item  was 

not  part 
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APPENDIX  G: 


SCORESHEET  FOR  COMMUNICATIONS  TASK  12, 
FIELD -EXPEDIENT  ANTENNA” 


"CONSTRUCTING  A 


APPENDIX  G 


SCORESHEET  FOR  COMMUNICATIONS  TASK  12,  "CONSTRUCTING  A 
FIELD-EXPEDIENT  ANTENNA" 


Say:  This  test  covers  your  ability  to  construct  a  field-expedient 

antenna.  You  have  before  you  (indicate  ground  cloth  with 
equipment)  all  the  equipment  you  should  need.  Here  is  your 
assigned  frequency  (3x5  card  with  46.90  frequency).  You 
are  to  construct  a  1/2  wave  omni-directional  VHF  antenna.  Do 
you  have  any  questions  about  these  instructions?  Begin. 


NOTE  TO  SCORER:  Check  Marine's  answer  to  step  1.  If  answer  is  wrong, 
say:  Use  5  feet. 

PERFORMANCE  STEPS 


GO _  NO-GO 


1.  Used  frequency  reference  chart  and 
formula  to  determine  correct  length 
of  wire. 

2.  Cut  antenna  wire  to  5  feet. 

3.  Stripped  approximately  3/4  inch  of  wire. 

4.  Twisted  the  field  wire  antenna. 

5 .  Attached  the  bare  ends  of  the  antenna 
to  the  antenna  connection  of  the  radio 
by  screwing  the  antenna  base  over  the 
leads  into  the  antenna  mounting  hole . 

6.  Selected  appropriate  insulator  (non¬ 
conduct  ive)  . 

7.  Tied  antenna  wire  to  one  end  of  the 
insulator . 

8.  Tied  rope  to  other  end  of  insulator. 

9.  Threw  rope  over  tree  limb  and  raised 
antenna  until  it  was  vertical. 

10.  Stripped  about  2  inches  off  of  each  end 
of  the  ground  wire . 

11.  Cut  2-3  feet  of  wire  for  the  ground. 

12.  Stripped  about  2  inches  off  of  each  end 
of  the  ground  wire . 

13.  Attached  one  end  of  the  ground  to  any 
metal  part  of  the  radio. 

14.  Drove  metal  stake  into  ground  near  radio. 

15.  Attached  other  end  of  ground  to  metal 
s  take . 
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APPENDIX  H: 


HOPT  SCORESHEET  FOR  CONTROL  UNIT  MOVEMENT  WHEN  NOT  IN  CONTACT 


APPENDIX  H 


HOPT  SCORESHEET  FOR  CONTROL  UNIT  MOVEMENT  WHEN  NOT  IN  CONTACT 

♦ 


I 


Say:  This  test  covers  your  ability  to  control  unit  movement  when 

not  in  contact.  You  are  a  squad  leader  located  at  grid  point 
277  529  (point) .  You  must  move  to  grid  point  213  548 
(point) .  The  enemy  is  located  in  the  high  ground  northwest 
of  the  river  (point) . 

Enemy  contact  is  not  likely  from  your  present  location 
(point)  to  within  small  arms  range  of  the  town  (point) .  When 
you  are  within  small  arms  range  enemy  contact  changes  to 
possible .  When  you  come  within  small  arms  range  of  the 
treeline  of  Hill  437  (point)  enemy  contact  changes  to  contact 
expected .  Do  you  have  any  questions? 

NOTE  TO  SCORER:  Repeat  elements  of  scenario 
if  asked. 


PERFORMANCE  STEPS 


GO 


Sc-y:  Indicate  your  general  direction  of 

movement  with  this  pointer. 

1.  Marine  indicated  a  northwesterly  route. 

NOTE  TO  SCORER:  Now  lay  the  three  white  chips  on 
the  predesignated  spots. 


Say:  Your  squad,  represented  by  this  chip, 
is  at  this  location  (point  to  grid 
229  530) .  What  formation  should  it 
be  in? 


2.  Indicated  a  good  formation. 


Good  Formation 


Poor  Formation 


Tactical  column 
Wedge 


Echelon  (R  or  L) 

Line 

Vee 


Say:  Your  squad  is  now  here  (print  to  grid 

224  539)  .  What  formation  should  it  be 
in? 


NO-GO 
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APPENDIX  H  (Continued) 


3.  Indicated  a  good  formation. 

Poor  Formation 
Tactical  column 
Line 
Vee 

Echelon  left 

Say:  Your  squad  is  now  here  (point  to  grid 

217  544).  What  formation  should  it  be  in? 

4.  Indicated  a  good  formation. 

Good  Formation  Poor  Formation 


Good  Formation 
Echelon  right 
Wedge 


Line  Column 

Vee  Echelon  (R  or  L) 

Wedge 

NOTE  TO  SCORER:  Remove  the  three  white  chips  from  the  T AC WAR  board  and 
hand  to  Marine. 

Say:  Each  white  chip  represents  a  fire team  of  your  squad.  Assume 

the  enemy  is  in  this  direction  (point) .  Show  me  a  wedge 
formation,  using  the  chips  on  the  board,  and  demonstrate  the 
hand  and  arm  signal. 

5.  Indicated  a  wedge  formation.  _  _ 

6.  Gave  the  proper  hand  and  arm  signal  for  a 
wedge . 

Say:  Show  me  a  column  formation  and  demonstrate 

the  hand  and  arm  signal. 

7.  Indicated  a  column  formation.  _  _ 


8.  Gave  the  proper  hand  and  aim  signal  for 
a  column. 

Say:  An  echelon  left. 

9.  Indicated  an  echelon  left  formation. 
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APPENDIX  H  (Continued) 


10.  Gave  the  proper  hand  and  arm  signal  for 
an  echelon  left. 

Say:  A  vee  formation. 

11.  Indicated  a  vee  formation. 

12.  Gave  the  proper  hand  and  arm  signal  for 
a  vee . 

Say:  A  line  formation. 

13.  Indicated  a  line  formation. 

14.  Gave  the  proper  hand  and  arm  signal  for 
a  line. 


» 
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APPENDIX  I 


ITEM  QUALITY  CONTROL  FOR  JOB -KNOWLEDGE  TESTS 

1 


PART  I- -Determining  the  Appropriate  Test  Mode 


Yes  Maybe 


Is  the  task  appropriately  measured 
by  means  of  a  multiple -choice  item? 

(i.e.,  is  the  task  knowledge -dependent, 
reading- dependent ,  and  time -independent?) 

Is  the  task  more  appropriately  measured 
by  means  of  a  fill-in,  alternate  choice, 
multiple  true- false,  essay,  or  testlet 
format? 

Does  the  task  require  physical  coordination, 
reflex  responses,  complex  construction, 
or  complex  reasoning? 

Is  the  item  appropriate  for  the  skill  level 
of  the  Marines  to  be  tested? 

Does  the  item  require  a  skill  other  than 
what  it  is  intended  to  measure?  (e.g., 
reading  skills,  understanding  of  notation, 
memory  for  facts  that  can  be  easily  looked 
up  in  a  manual)? 

Does  the  item  require  less  than  what  it  is 
intended  to  measure?  (e.g.,  does  it 
unnecessarily  simplify  a  complex  task  into 
a  choice  between  alternatives?) 


* 


\ 
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No 


PART  II- -Item  technical  review  (modified  from  Hambleton,  1980) 


Test  Item  Characteristics  (Mark  "/"  for 
Yes,  "X"  for  No  and  "?"  for  unsure) 


Test  Item 

Numbers  r 


12  3 

Is  the  item  stem  clearly  written  for  the  intended 
Marines? 


Is  the  stem  free  of  irrelevant  material? 


Is  a  single  problem  clearly  defined  in  the  item 
stem? 


Are  the  answer  choices  clearly  written  for  the 
intended  group  of  Marines? 


Are  the  answer  choices  free  of  irrelevant  material? 


Is  there  a  correct  answer  or  a  clearly  best  answei? 


Have  words  like  "always" ,  "none" ,  or  "all"  been 
removed? 


Are  likely  student  mistakes  used  to  prepare  incorrect 
choices? 


Is  "all  of  the  above"  avoided  as  an  answer  choice? 


Are  the  answer  choices  arranged  in  a  logical  sequence 
(if  one  exists)? 


Was  the  correct  answer  randomly  positioned  among 
the  available  choices? 


Are  all  repetitious  words  or  expressions  removed 
from  the  answer  choices  and  included  in  the  item  stem? 


Are  all  the  answer  choices  of  the  same  length? 


Do  the  item  stem  and  answer  choices  follow  standard 
rules  of  punctuation  and  grammar? 


Are  all  negatives  underlined? 


Are  grammatical  cues  between  the  item  stem 
and  the  answer  choices,  which  might  give  the 
correct  answer  away,  removed? 
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APPENDIX  I  (Continued) 


Test  Item 
Numbers 


1 

2 

3 

Are  letters  used  in  front  of 
to  identify  them? 

the  possible  answers 

Have  expressions  like  "which 
not"  been  avoided? 

of  the  following  is 

Disregarding  any  technical  flaws  that  might  exist  in  the  test  item,  how 
well  do  you  think  the  content  of  the  test  item  matches  with  the  duty 
area  of  the  content  defined  by  the  domain  specification?  (1-poor, 
2-fair,  3-good,  4-very  good,  5-excellent) 


